Benchmarking the semi-supervised naïve Bayes classifier
نویسندگان
چکیده
Semi-supervised learning involves constructing predictive models with both labelled and unlabelled training data. The need for semi-supervised learning is driven by the fact that unlabelled data are often easy and cheap to obtain, whereas labelling data requires costly and time consuming human intervention and expertise. Semi-supervised methods commonly use self training, which involves using the labelled data to predict the unlabelled data, then iteratively reconstructing classifiers using the predicted labels. Our aim is to determine whether self training classifiers actually improves performance. Expectation maximization is a commonly used self training scheme. We investigate whether an expectation maximization scheme improves a naı̈ve Bayes classifier through experimentation with 30 discrete and 20 continuous real world benchmark UCI datasets. Rather surprisingly, we find that in practice the self training actually makes the classifier worse. The cause for this detrimental affect on performance could either be with the self training scheme itself, or how self training works in conjunction with the classifier. Our hypothesis is that it is the latter cause, and the violation of the naı̈ve Bayes model assumption of independence of attributes means predictive errors propagate through the self training scheme. To test whether this is the case, we generate simulated data with the same attribute distribution as the UCI data, but where the attributes are independent. Experiments with this data demonstrate that semi-supervised learning does improve performance, leading to significantly more accurate classifiers. These results demonstrate that semi-supervised learning cannot be applied blindly without considering the nature of the classifier, because the assumptions implicit in the classifier may result in a degradation in performance.
منابع مشابه
Application of Data Mining Using Bayesian Belief Network To Classify Quality of Web Services
In this paper, we employed Naïve Bayes, Augmented Naïve Bayes, Tree Augmented Naïve Bayes, Sons & Spouses, Markov Blanket, Augmented Markov Blanket, Semi Supervised and Bayesian network techniques to rank web services. The Bayesian Network is demonstrated on a dataset taken from literature. The dataset consists of 364 web services whose quality is described by 9 attributes. Here, the attributes...
متن کاملImage Classification Using Naïve Bayes Classifier
An image classification scheme using Naïve Bayes Classifier is proposed in this paper. The proposed Naive Bayes Classifier-based image classifier can be considered as the maximum a posteriori decision rule. The Naïve Bayes Classifier can produce very accurate classification results with a minimum training time when compared to conventional supervised or unsupervised learning algorithms. Compreh...
متن کاملProposed Techniques to Remove Flaming Problems from Social Networking Sites and outcome of Naïve Bayes Classifier for Detection of Flames
Natural Language Processing (NLP)[1][2][5] is a field of Computer Science concerned with the interactions between Computer and Human (Natural) Languages. Social Networking Sites are amongst the most effective communication tools now a days. But it also gave rise to the problem of flaming which is difficult to deal with. A flaming incident is triggered by comments and actions of users in SNS tha...
متن کاملClassification Using Naïve Bayes- a Survey
Classification, particularly Text Classification, is a supervised learning approach categorizing into various categories, the available training set of correctly identified observations analyzed into a set of features. There are many phases involved in classification. The main classification phase involves the use of classification algorithms or classifiers. Among the various classifiers, the N...
متن کاملSemi-supervised Learning Based Aesthetic Classifier for Short Animations Embedded in Web Pages
We propose a semi-supervised learning based computational model for aesthetic classification of short animation videos, which are nowadays part of many web pages. The proposed model is expected to be useful in developing an overall aesthetic model of web pages, leading to better evaluation of web page usability. We identified two feature sets describing aesthetics of an animated video. Based on...
متن کامل